options(knitr.duplicate.label = 'allow')
summary(cars)
## speed dist
## Min. : 4.0 Min. : 2.00
## 1st Qu.:12.0 1st Qu.: 26.00
## Median :15.0 Median : 36.00
## Mean :15.4 Mean : 42.98
## 3rd Qu.:19.0 3rd Qu.: 56.00
## Max. :25.0 Max. :120.00
1.Here are some correlations between the variables:
human <- read.table("http://s3.amazonaws.com/assets.datacamp.com/production/course_2218/datasets/human2.txt", sep= ",", header=TRUE, row.names = 1)
library(GGally)
## Loading required package: ggplot2
library(corrplot)
## corrplot 0.84 loaded
library(dplyr)
##
## Attaching package: 'dplyr'
## The following object is masked from 'package:GGally':
##
## nasa
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(tidyr)
ggpairs(human)
cor(human)
## Edu2.FM Labo.FM Edu.Exp Life.Exp GNI
## Edu2.FM 1.000000000 0.009564039 0.59325156 0.5760299 0.43030485
## Labo.FM 0.009564039 1.000000000 0.04732183 -0.1400125 -0.02173971
## Edu.Exp 0.593251562 0.047321827 1.00000000 0.7894392 0.62433940
## Life.Exp 0.576029853 -0.140012504 0.78943917 1.0000000 0.62666411
## GNI 0.430304846 -0.021739705 0.62433940 0.6266641 1.00000000
## Mat.Mor -0.660931770 0.240461075 -0.73570257 -0.8571684 -0.49516234
## Ado.Birth -0.529418415 0.120158862 -0.70356489 -0.7291774 -0.55656208
## Parli.F 0.078635285 0.250232608 0.20608156 0.1700863 0.08920818
## Mat.Mor Ado.Birth Parli.F
## Edu2.FM -0.6609318 -0.5294184 0.07863528
## Labo.FM 0.2404611 0.1201589 0.25023261
## Edu.Exp -0.7357026 -0.7035649 0.20608156
## Life.Exp -0.8571684 -0.7291774 0.17008631
## GNI -0.4951623 -0.5565621 0.08920818
## Mat.Mor 1.0000000 0.7586615 -0.08944000
## Ado.Birth 0.7586615 1.0000000 -0.07087810
## Parli.F -0.0894400 -0.0708781 1.00000000
dim(human)
## [1] 155 8
str(human)
## 'data.frame': 155 obs. of 8 variables:
## $ Edu2.FM : num 1.007 0.997 0.983 0.989 0.969 ...
## $ Labo.FM : num 0.891 0.819 0.825 0.884 0.829 ...
## $ Edu.Exp : num 17.5 20.2 15.8 18.7 17.9 16.5 18.6 16.5 15.9 19.2 ...
## $ Life.Exp : num 81.6 82.4 83 80.2 81.6 80.9 80.9 79.1 82 81.8 ...
## $ GNI : int 64992 42261 56431 44025 45435 43919 39568 52947 42155 32689 ...
## $ Mat.Mor : int 4 6 6 5 6 7 9 28 11 8 ...
## $ Ado.Birth: num 7.8 12.1 1.9 5.1 6.2 3.8 8.2 31 14.5 25.3 ...
## $ Parli.F : num 39.6 30.5 28.5 38 36.9 36.9 19.9 19.4 28.2 31.4 ...
colnames(human)
## [1] "Edu2.FM" "Labo.FM" "Edu.Exp" "Life.Exp" "GNI" "Mat.Mor"
## [7] "Ado.Birth" "Parli.F"
head(human)
## Edu2.FM Labo.FM Edu.Exp Life.Exp GNI Mat.Mor Ado.Birth
## Norway 1.0072389 0.8908297 17.5 81.6 64992 4 7.8
## Australia 0.9968288 0.8189415 20.2 82.4 42261 6 12.1
## Switzerland 0.9834369 0.8251001 15.8 83.0 56431 6 1.9
## Denmark 0.9886128 0.8840361 18.7 80.2 44025 5 5.1
## Netherlands 0.9690608 0.8286119 17.9 81.6 45435 6 6.2
## Germany 0.9927835 0.8072289 16.5 80.9 43919 7 3.8
## Parli.F
## Norway 39.6
## Australia 30.5
## Switzerland 28.5
## Denmark 38.0
## Netherlands 36.9
## Germany 36.9
The dataset created and used in this exercise is composed of eight variables and 155 observations. Out of the included variables, “GNI” and “Mat.Mor” are integer variables and the other variables are all numerical. In the following table, the information stored (by variables) is shown and elaborated
Variable - Explanation
Labo.FM - ratio of females and males in the labour force Edu.Exp - expected years of schooling Life.Exp - life expectancy at birth GNI - gross national income per capita Mat.Mor - maternal mortality ratio Ado.Birth - adolescent birth rate Parli.F - percentage of female representatives in parliament
options(knitr.duplicate.label = 'allow', debug = TRUE)
library(pander)
##
## Attaching package: 'pander'
## The following object is masked from 'package:GGally':
##
## wrap
pandoc.table(summary(human), caption = "Summary of Human data", split.table = 80)
##
## -----------------------------------------------------------------
## Edu2.FM Labo.FM Edu.Exp Life.Exp
## ---------------- ---------------- --------------- ---------------
## Min. :0.1717 Min. :0.1857 Min. : 5.40 Min. :49.00
##
## 1st Qu.:0.7264 1st Qu.:0.5984 1st Qu.:11.25 1st Qu.:66.30
##
## Median :0.9375 Median :0.7535 Median :13.50 Median :74.20
##
## Mean :0.8529 Mean :0.7074 Mean :13.18 Mean :71.65
##
## 3rd Qu.:0.9968 3rd Qu.:0.8535 3rd Qu.:15.20 3rd Qu.:77.25
##
## Max. :1.4967 Max. :1.0380 Max. :20.20 Max. :83.50
## -----------------------------------------------------------------
##
## Table: Summary of Human data (continued below)
##
##
## ------------------------------------------------------------------
## GNI Mat.Mor Ado.Birth Parli.F
## ---------------- ---------------- ---------------- ---------------
## Min. : 581 Min. : 1.0 Min. : 0.60 Min. : 0.00
##
## 1st Qu.: 4198 1st Qu.: 11.5 1st Qu.: 12.65 1st Qu.:12.40
##
## Median : 12040 Median : 49.0 Median : 33.60 Median :19.30
##
## Mean : 17628 Mean : 149.1 Mean : 47.16 Mean :20.91
##
## 3rd Qu.: 24512 3rd Qu.: 190.0 3rd Qu.: 71.95 3rd Qu.:27.95
##
## Max. :123124 Max. :1100.0 Max. :204.80 Max. :57.50
## ------------------------------------------------------------------
ggpairs(human, mapping = aes(alpha = 0.3), lower = list(combo = wrap("facethist")))
The summary shows interesting observations on the variables. The adoloscent birth rate (Ado.Birth) is positively correlated (0.759) with maternal mortality ratio but negatively correlated (-0.857) with life expectancy at birth (Life.Exp). Similarly, ratio of females and males with secondary education (Edu2.FM) and expected years of schooling (Edu.Exp) are both positively correlated with life expectancy at birth (Life.Exp). On the other hand, there is very little correlation between the ratio of females and males in labour force (Labo.FM) with “Edu.Exp” and “GNI”.
PCA analysis and a biplot (in a couple different ways)
biplot(pca_human, choices = 1:2, cex=c(0.8,1), col=c(“grey40”, “deeppink2”))
In the following section, we will summarize the principal components and make a principal component analysis (PCA) plot. First, PCA is done on non-standardized data followed up by standardized data.
pca_human<-prcomp(human)
biplot(pca_human, choices = 1:2, cex=c(0.8,1), col=c("grey40", "deeppink2"))
## Warning in arrows(0, 0, y[, 1L] * 0.8, y[, 2L] * 0.8, col = col[2L], length
## = arrow.len): zero-length arrow is of indeterminate angle and so skipped
## Warning in arrows(0, 0, y[, 1L] * 0.8, y[, 2L] * 0.8, col = col[2L], length
## = arrow.len): zero-length arrow is of indeterminate angle and so skipped
## Warning in arrows(0, 0, y[, 1L] * 0.8, y[, 2L] * 0.8, col = col[2L], length
## = arrow.len): zero-length arrow is of indeterminate angle and so skipped
## Warning in arrows(0, 0, y[, 1L] * 0.8, y[, 2L] * 0.8, col = col[2L], length
## = arrow.len): zero-length arrow is of indeterminate angle and so skipped
sum_pca_human<-summary(pca_human)
sum_pca_human
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Standard deviation 1.854e+04 185.5219 25.19 11.45 3.766 1.566 0.1912
## Proportion of Variance 9.999e-01 0.0001 0.00 0.00 0.000 0.000 0.0000
## Cumulative Proportion 9.999e-01 1.0000 1.00 1.00 1.000 1.000 1.0000
## PC8
## Standard deviation 0.1591
## Proportion of Variance 0.0000
## Cumulative Proportion 1.0000
sum_pca_human_var<-sum_pca_human$sdev^2
sum_pca_human_var
## [1] 3.438860e+08 3.441836e+04 6.343853e+02 1.312035e+02 1.418457e+01
## [6] 2.452081e+00 3.655943e-02 2.531638e-02
pca_pr <- round(100*sum_pca_human$importance[2, ], digits = 1)
pc_lab<-paste0(names(pca_pr), " (", pca_pr, "%)")
biplot(pca_human, cex = c(0.8, 1), col = c("grey40", "deeppink2"), xlab = pc_lab[1], ylab = pc_lab[2], main = "PCA plot of non-scaled human data")
## Warning in arrows(0, 0, y[, 1L] * 0.8, y[, 2L] * 0.8, col = col[2L], length
## = arrow.len): zero-length arrow is of indeterminate angle and so skipped
## Warning in arrows(0, 0, y[, 1L] * 0.8, y[, 2L] * 0.8, col = col[2L], length
## = arrow.len): zero-length arrow is of indeterminate angle and so skipped
## Warning in arrows(0, 0, y[, 1L] * 0.8, y[, 2L] * 0.8, col = col[2L], length
## = arrow.len): zero-length arrow is of indeterminate angle and so skipped
## Warning in arrows(0, 0, y[, 1L] * 0.8, y[, 2L] * 0.8, col = col[2L], length
## = arrow.len): zero-length arrow is of indeterminate angle and so skipped
#biplot(pca_human, choices = 1:2, cex = c(1, 1), col = c("grey40", "deeppink2"),sub = "PC1 & PC2 with non-standardised dataset")
The PCA biplot above does not provide a meaningful insight to the data as it shows that a single variable, “GNI” has a dominant impact and greater weight. Moreover, “GNI” has a larger variance compared to other variables.
Next, we will scale the variables in the human data and compute principal components and plot the results.
human_std <- scale(human)
pca_human_std <- prcomp(human_std)
biplot(pca_human, choices = 1:2, cex=c(0.8,1), col=c("grey40", "deeppink2"))
## Warning in arrows(0, 0, y[, 1L] * 0.8, y[, 2L] * 0.8, col = col[2L], length
## = arrow.len): zero-length arrow is of indeterminate angle and so skipped
## Warning in arrows(0, 0, y[, 1L] * 0.8, y[, 2L] * 0.8, col = col[2L], length
## = arrow.len): zero-length arrow is of indeterminate angle and so skipped
## Warning in arrows(0, 0, y[, 1L] * 0.8, y[, 2L] * 0.8, col = col[2L], length
## = arrow.len): zero-length arrow is of indeterminate angle and so skipped
## Warning in arrows(0, 0, y[, 1L] * 0.8, y[, 2L] * 0.8, col = col[2L], length
## = arrow.len): zero-length arrow is of indeterminate angle and so skipped
## Warning in arrows(0, 0, y[, 1L] * 0.8, y[, 2L] * 0.8, col = col[2L], length
## = arrow.len): zero-length arrow is of indeterminate angle and so skipped
pca_human_s<-prcomp(human, scale. = TRUE)
sum_pca_human_s<-summary(pca_human_s)
pca_pr_s <- round(100*sum_pca_human_s$importance[2, ], digits = 1)
pc_lab<-paste0(names(pca_pr_s), " (", pca_pr_s, "%)")
sum_pca_human_var_s<-sum_pca_human_s$sdev^2
sum_pca_human_var_s
## [1] 4.2883701 1.2989625 0.7657100 0.6066276 0.4381862 0.2876242 0.2106805
## [8] 0.1038390
biplot(pca_human_s, cex = c(0.8, 1), col = c("grey40", "deeppink2"), xlab = pc_lab[1], ylab = pc_lab[2], main = "PCA plot of scaled human data")
Here, after standardization, we can see that the plots look different and thus the results are different. The results are different after scaling because PCA is more sensitive and informative when the original features are scaled. Also, PCA assumes that features with larger variances are more important that those with smaller variances. In the non-scaled pca plot, we observed that the variables with higher values have a bigger influence as is the case with the “GNI” variable. After scaling the data, the variance between the variables is more reasonable. The first principal component (PC1) explains 53% of the variation compared to the 100% from when the data was not scaled.
Interpreting the two principal component dimensions: (1). Correlations between variables: The smaller angle between the arrows explains the greater correlation between the variables. With this assumption in mind, we can see that four of the variables, “Edu.Exp”, “Life.Exp”, “GNU” and “EDU.FM” are correlated. Out of those, “GNU” and “EDU2.FM” have the highest correlation as explained by the arrows and the angles formed by the arrows. In the same way, the variables “Parli.F” and “Labo.FM” are also correlated as are the variables “Mat.Mor” and “Ado.Birth”. In addition, the plot shows that the variables “Life.Exp” and “Ado.Birth” are the least correlated as they are furthest in the plot (indicated by the large angle between these two variables).
(2). Correlation between variables and Principal components: It is assumed that the smaller the angle between the variables and principal components, the more positively correlated the variable is. In light of the assumption, the variables “Parli.F” and “Labo.FM” are positively correlated to PC1 (i.e they are contributing the direction of PC1) whereas other variables are positively correlated to PC2 and thus directing the arrows towards PC2. Also, for PC2, “Life.Exp”, “Edu2.FM”, “GNU” and “Ado.FM” have higher weights than other variables.
We will use tea data from the FactoMineR package to practice multiple correspondence analysis (MCA). In this data, there are 300 observations and 36 variables.
library(FactoMineR)
data("tea")
str(tea)
## 'data.frame': 300 obs. of 36 variables:
## $ breakfast : Factor w/ 2 levels "breakfast","Not.breakfast": 1 1 2 2 1 2 1 2 1 1 ...
## $ tea.time : Factor w/ 2 levels "Not.tea time",..: 1 1 2 1 1 1 2 2 2 1 ...
## $ evening : Factor w/ 2 levels "evening","Not.evening": 2 2 1 2 1 2 2 1 2 1 ...
## $ lunch : Factor w/ 2 levels "lunch","Not.lunch": 2 2 2 2 2 2 2 2 2 2 ...
## $ dinner : Factor w/ 2 levels "dinner","Not.dinner": 2 2 1 1 2 1 2 2 2 2 ...
## $ always : Factor w/ 2 levels "always","Not.always": 2 2 2 2 1 2 2 2 2 2 ...
## $ home : Factor w/ 2 levels "home","Not.home": 1 1 1 1 1 1 1 1 1 1 ...
## $ work : Factor w/ 2 levels "Not.work","work": 1 1 2 1 1 1 1 1 1 1 ...
## $ tearoom : Factor w/ 2 levels "Not.tearoom",..: 1 1 1 1 1 1 1 1 1 2 ...
## $ friends : Factor w/ 2 levels "friends","Not.friends": 2 2 1 2 2 2 1 2 2 2 ...
## $ resto : Factor w/ 2 levels "Not.resto","resto": 1 1 2 1 1 1 1 1 1 1 ...
## $ pub : Factor w/ 2 levels "Not.pub","pub": 1 1 1 1 1 1 1 1 1 1 ...
## $ Tea : Factor w/ 3 levels "black","Earl Grey",..: 1 1 2 2 2 2 2 1 2 1 ...
## $ How : Factor w/ 4 levels "alone","lemon",..: 1 3 1 1 1 1 1 3 3 1 ...
## $ sugar : Factor w/ 2 levels "No.sugar","sugar": 2 1 1 2 1 1 1 1 1 1 ...
## $ how : Factor w/ 3 levels "tea bag","tea bag+unpackaged",..: 1 1 1 1 1 1 1 1 2 2 ...
## $ where : Factor w/ 3 levels "chain store",..: 1 1 1 1 1 1 1 1 2 2 ...
## $ price : Factor w/ 6 levels "p_branded","p_cheap",..: 4 6 6 6 6 3 6 6 5 5 ...
## $ age : int 39 45 47 23 48 21 37 36 40 37 ...
## $ sex : Factor w/ 2 levels "F","M": 2 1 1 2 2 2 2 1 2 2 ...
## $ SPC : Factor w/ 7 levels "employee","middle",..: 2 2 4 6 1 6 5 2 5 5 ...
## $ Sport : Factor w/ 2 levels "Not.sportsman",..: 2 2 2 1 2 2 2 2 2 1 ...
## $ age_Q : Factor w/ 5 levels "15-24","25-34",..: 3 4 4 1 4 1 3 3 3 3 ...
## $ frequency : Factor w/ 4 levels "1/day","1 to 2/week",..: 1 1 3 1 3 1 4 2 3 3 ...
## $ escape.exoticism: Factor w/ 2 levels "escape-exoticism",..: 2 1 2 1 1 2 2 2 2 2 ...
## $ spirituality : Factor w/ 2 levels "Not.spirituality",..: 1 1 1 2 2 1 1 1 1 1 ...
## $ healthy : Factor w/ 2 levels "healthy","Not.healthy": 1 1 1 1 2 1 1 1 2 1 ...
## $ diuretic : Factor w/ 2 levels "diuretic","Not.diuretic": 2 1 1 2 1 2 2 2 2 1 ...
## $ friendliness : Factor w/ 2 levels "friendliness",..: 2 2 1 2 1 2 2 1 2 1 ...
## $ iron.absorption : Factor w/ 2 levels "iron absorption",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ feminine : Factor w/ 2 levels "feminine","Not.feminine": 2 2 2 2 2 2 2 1 2 2 ...
## $ sophisticated : Factor w/ 2 levels "Not.sophisticated",..: 1 1 1 2 1 1 1 2 2 1 ...
## $ slimming : Factor w/ 2 levels "No.slimming",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ exciting : Factor w/ 2 levels "exciting","No.exciting": 2 1 2 2 2 2 2 2 2 2 ...
## $ relaxing : Factor w/ 2 levels "No.relaxing",..: 1 1 2 2 2 2 2 2 2 2 ...
## $ effect.on.health: Factor w/ 2 levels "effect on health",..: 2 2 2 2 2 2 2 2 2 2 ...
dim(tea)
## [1] 300 36
summary(tea)
## breakfast tea.time evening lunch
## breakfast :144 Not.tea time:131 evening :103 lunch : 44
## Not.breakfast:156 tea time :169 Not.evening:197 Not.lunch:256
##
##
##
##
##
## dinner always home work
## dinner : 21 always :103 home :291 Not.work:213
## Not.dinner:279 Not.always:197 Not.home: 9 work : 87
##
##
##
##
##
## tearoom friends resto pub
## Not.tearoom:242 friends :196 Not.resto:221 Not.pub:237
## tearoom : 58 Not.friends:104 resto : 79 pub : 63
##
##
##
##
##
## Tea How sugar how
## black : 74 alone:195 No.sugar:155 tea bag :170
## Earl Grey:193 lemon: 33 sugar :145 tea bag+unpackaged: 94
## green : 33 milk : 63 unpackaged : 36
## other: 9
##
##
##
## where price age sex
## chain store :192 p_branded : 95 Min. :15.00 F:178
## chain store+tea shop: 78 p_cheap : 7 1st Qu.:23.00 M:122
## tea shop : 30 p_private label: 21 Median :32.00
## p_unknown : 12 Mean :37.05
## p_upscale : 53 3rd Qu.:48.00
## p_variable :112 Max. :90.00
##
## SPC Sport age_Q frequency
## employee :59 Not.sportsman:121 15-24:92 1/day : 95
## middle :40 sportsman :179 25-34:69 1 to 2/week: 44
## non-worker :64 35-44:40 +2/day :127
## other worker:20 45-59:61 3 to 6/week: 34
## senior :35 +60 :38
## student :70
## workman :12
## escape.exoticism spirituality healthy
## escape-exoticism :142 Not.spirituality:206 healthy :210
## Not.escape-exoticism:158 spirituality : 94 Not.healthy: 90
##
##
##
##
##
## diuretic friendliness iron.absorption
## diuretic :174 friendliness :242 iron absorption : 31
## Not.diuretic:126 Not.friendliness: 58 Not.iron absorption:269
##
##
##
##
##
## feminine sophisticated slimming
## feminine :129 Not.sophisticated: 85 No.slimming:255
## Not.feminine:171 sophisticated :215 slimming : 45
##
##
##
##
##
## exciting relaxing effect.on.health
## exciting :116 No.relaxing:113 effect on health : 66
## No.exciting:184 relaxing :187 No.effect on health:234
##
##
##
##
##
library(tidyr)
library(dplyr)
keep<- c("breakfast","tea.time","friends","frequency","Tea","sugar","sex","sophisticated")
my_tea <- dplyr::select(tea, one_of(keep))
gather(my_tea) %>% ggplot(aes(value)) + geom_bar() + theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 8)) + facet_wrap("key", scales = "free")
## Warning: attributes are not identical across measure variables;
## they will be dropped
mca_tea <- MCA(my_tea, graph=FALSE)
summary(mca_tea, nbelements=Inf, nbind=5)
##
## Call:
## MCA(X = my_tea, graph = FALSE)
##
##
## Eigenvalues
## Dim.1 Dim.2 Dim.3 Dim.4 Dim.5 Dim.6
## Variance 0.213 0.189 0.159 0.136 0.131 0.118
## % of var. 15.481 13.717 11.556 9.865 9.518 8.606
## Cumulative % of var. 15.481 29.198 40.754 50.619 60.137 68.743
## Dim.7 Dim.8 Dim.9 Dim.10 Dim.11
## Variance 0.112 0.093 0.091 0.072 0.061
## % of var. 8.150 6.766 6.644 5.254 4.444
## Cumulative % of var. 76.893 83.658 90.302 95.556 100.000
##
## Individuals (the 5 first)
## Dim.1 ctr cos2 Dim.2 ctr cos2 Dim.3
## 1 | 0.359 0.202 0.071 | 1.116 2.201 0.686 | -0.040
## 2 | -0.198 0.061 0.023 | 0.845 1.261 0.419 | 0.349
## 3 | -0.484 0.367 0.226 | -0.243 0.105 0.057 | -0.211
## 4 | 0.779 0.951 0.499 | 0.345 0.210 0.098 | -0.071
## 5 | -0.065 0.007 0.003 | 0.816 1.176 0.480 | -0.026
## ctr cos2
## 1 0.003 0.001 |
## 2 0.255 0.071 |
## 3 0.094 0.043 |
## 4 0.011 0.004 |
## 5 0.001 0.000 |
##
## Categories
## Dim.1 ctr cos2 v.test Dim.2 ctr
## breakfast | -0.545 8.384 0.275 -9.060 | 0.576 10.563
## Not.breakfast | 0.503 7.739 0.275 9.060 | -0.532 9.750
## Not.tea time | 0.663 11.263 0.340 10.090 | 0.345 3.447
## tea time | -0.514 8.730 0.340 -10.090 | -0.268 2.672
## friends | -0.115 0.504 0.025 -2.721 | -0.375 6.083
## Not.friends | 0.216 0.950 0.025 2.721 | 0.706 11.465
## 1/day | 0.296 1.631 0.041 3.487 | 0.609 7.774
## 1 to 2/week | 1.072 9.899 0.198 7.686 | -1.161 13.109
## +2/day | -0.727 13.148 0.388 -10.775 | 0.105 0.308
## 3 to 6/week | 0.502 1.674 0.032 3.100 | -0.589 2.607
## black | -0.394 2.246 0.051 -3.896 | 0.301 1.477
## Earl Grey | 0.030 0.034 0.002 0.701 | -0.174 1.295
## green | 0.707 3.224 0.062 4.295 | 0.345 0.869
## No.sugar | -0.467 6.621 0.233 -8.352 | -0.031 0.033
## sugar | 0.499 7.078 0.233 8.352 | 0.033 0.035
## F | -0.443 6.832 0.286 -9.249 | -0.357 5.014
## M | 0.646 9.969 0.286 9.249 | 0.521 7.315
## Not.sophisticated | -0.056 0.052 0.001 -0.606 | 0.786 11.599
## sophisticated | 0.022 0.020 0.001 0.606 | -0.311 4.586
## cos2 v.test Dim.3 ctr cos2 v.test
## breakfast 0.306 9.573 | -0.244 2.256 0.055 -4.060 |
## Not.breakfast 0.306 -9.573 | 0.226 2.082 0.055 4.060 |
## Not.tea time 0.092 5.254 | 0.157 0.844 0.019 2.386 |
## tea time 0.092 -5.254 | -0.121 0.654 0.019 -2.386 |
## friends 0.265 -8.898 | -0.294 4.448 0.163 -6.983 |
## Not.friends 0.265 8.898 | 0.554 8.382 0.163 6.983 |
## 1/day 0.172 7.164 | -0.206 1.058 0.020 -2.426 |
## 1 to 2/week 0.232 -8.325 | 0.110 0.139 0.002 0.787 |
## +2/day 0.008 1.552 | 0.021 0.015 0.000 0.312 |
## 3 to 6/week 0.044 -3.642 | 0.355 1.123 0.016 2.194 |
## black 0.030 2.974 | 0.821 13.085 0.221 8.125 |
## Earl Grey 0.055 -4.047 | -0.535 14.485 0.516 -12.424 |
## green 0.015 2.098 | 1.287 14.344 0.205 7.827 |
## No.sugar 0.001 -0.552 | 0.568 13.095 0.344 10.148 |
## sugar 0.001 0.552 | -0.607 13.998 0.344 -10.148 |
## F 0.186 -7.458 | 0.027 0.035 0.001 0.569 |
## M 0.186 7.458 | -0.040 0.051 0.001 -0.569 |
## Not.sophisticated 0.244 8.545 | -0.564 7.100 0.126 -6.136 |
## sophisticated 0.244 -8.545 | 0.223 2.807 0.126 6.136 |
##
## Categorical variables (eta2)
## Dim.1 Dim.2 Dim.3
## breakfast | 0.275 0.306 0.055 |
## tea.time | 0.340 0.092 0.019 |
## friends | 0.025 0.265 0.163 |
## frequency | 0.449 0.359 0.030 |
## Tea | 0.094 0.055 0.533 |
## sugar | 0.233 0.001 0.344 |
## sex | 0.286 0.186 0.001 |
## sophisticated | 0.001 0.244 0.126 |
plot(mca_tea, invisible = c("ind"), habillage = "quali", sub = "MCA of tea dataset")
In general, the MCA plot grouped the categories that are in a way, equivalent to each other, at least to some extent. I suppose it would be better to refer to them as, “similar cathegories” (both ways, as in: both are individually similar to the other one, so they share similarity to each other). Categories such as “tea time” and “friends” are grouped together and in the same way, so are the categories such as “Not friends”" and “Not.tea time”. In other words, friends tend to spend tea time together and those who do not have tea during other times (not tea times) are not close friends. The plot also indicates that females are more social than males because they have friends, and participate in tea time. It also indicates that females do not put sugar into tea, like males do.